New Transcription System using Automatic Speech Recognition ( ASR

نویسنده

Tatsuya Kawahara

چکیده

Technical Consultant of the House The Japanese Parliament (Diet) was founded in 1890. Since the very first session, verbatim records had been made by manual shorthand over a hundred years. However, early in this century, the government terminated recruiting stenographers, and investigated alternative methods including ASR technologies. The House of Representatives has chosen ASR for the new system. The system was deployed and tested in 2010, and it has been in official operation from April 2011. The new system handles all plenary sessions and committee meetings. Speech is captured by the stand microphones in meeting rooms. Separate channels are used for interpellators and ministers. The speaker-independent ASR system generates an initial draft, which is corrected by reporters. Roughly speaking, the system's recognition error rate is around 10%, and disfluencies and colloquial expressions to be corrected also account for 10%. Thus, reporters still play an important role. There are Japanese language-specific issues. First, we need to convert kana phonetic symbols to kanji or Chinese characters. This conversion often involves ambiguity because of many homonyms. Therefore, it is very hard to type in real time. Only limited stenographers using a special keyboard can perform. Moreover, there are differences between the spoken-style and the transcript style. So, we need to rephrase in many cases, but re-speaking or shadow speaking is not so simple. Requirements for the ASR system are as follows. The first is high accuracy; over 90% is preferred. This can be easily achieved in plenary sessions, but is difficult in committee meetings, which are interactive, spontaneous, and often excited. The second requirement is fast turnaround. In the House, each reporter is assigned every 5-minute segment of a meeting session. ASR should be performed almost in real-time, so reporters can start working promptly even during the session. The third issue is compliance to the orthodox transcript guideline of the House. The electric dictionary of 60K lexical entries used in the system was proofed. In summary, the compliance issue is solved by hard work, fast turnaround is feasible by current computers, and high accuracy is technically most challenging.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating named entity recognition into the speech transcription process

Named Entity Recognition (NER) from speech usually involves two sequential steps: transcribing the speech using Automatic Speech Recognition (ASR) and annotating the outputs of the ASR process using NER techniques. Recognizing named entities in automatic transcripts is difficult due to the presence of transcription errors and the absence of some important NER clues, such as capitalization and p...

متن کامل

Overview of Automatic Speech Recognition for Transcription System in the Japanese Parliament (Diet)

This article describes a new automatic transcription system in the Japanese Parliament which deploys our automatic speech recognition (ASR) technology and has been in official operation since April 2011. The speaker-independent ASR system handles all plenary sessions and committee meetings to generate an initial draft, which is corrected by Parliamentary reporters. To achieve high recognition p...

متن کامل

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)

This article describes a new automatic transcription system in the Japanese Parliament which deploys our automatic speech recognition (ASR) technology. To achieve high recognition performance in spontaneous meeting speech, we have investigated an efficient training scheme with minimal supervision which can exploit a huge amount of real data. Specifically, we have proposed a lightly-supervised t...

متن کامل

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...

متن کامل

How to evaluate ASR output for named entity recognition?

The standard metric to evaluate automatic speech recognition (ASR) systems is the word error rate (WER). WER has proven very useful in stand-alone ASR systems. Nowadays, these systems are often embedded in complex natural language processing systems to perform tasks like speech translation, manmachine dialogue, or information retrieval from speech. This exacerbates the need for the speech proce...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

New Transcription System using Automatic Speech Recognition ( ASR

نویسنده

چکیده

منابع مشابه

Incorporating named entity recognition into the speech transcription process

Overview of Automatic Speech Recognition for Transcription System in the Japanese Parliament (Diet)

Transcription System Using Automatic Speech Recognition for the Japanese Parliament (Diet)

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

How to evaluate ASR output for named entity recognition?

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

عنوان ژورنال:

اشتراک گذاری